nnlm multi
Overview of the TREC 2022 deep learning track
Craswell, Nick, Mitra, Bhaskar, Yilmaz, Emine, Campos, Daniel, Lin, Jimmy, Voorhees, Ellen M., Soboroff, Ian
At TREC 2022, we hosted the fourth TREC Deep Learning Track continuing our focus on benchmarking ad hoc retrieval methods in the large-data regime. As in previous years [Craswell et al., 2020, 2021a, 2022], we leverage the MS MARCO datasets [Bajaj et al., 2016] that made hundreds of thousands of human annotated training labels available for both passage and document ranking tasks. In addition, last year we refreshed both the passage and the document collections which also led to a nearly 16 times increase in the size of the passage collection and nearly four times increase in the document collection size. In addition to evaluating ranking methods on the larger collections, the data refresh also aimed at providing additional metadata-- e.g., passage-to-document mappings--that may be useful for ranking, as well as incorporating some fixes for known text encoding issues in previous versions of the datasets. This year we continue to benchmark against these larger passage and document collections. However, the significant increase in collection sizes last year led to a corresponding increase in the number of relevant results in the collection per query and the existing judgment budget was exceeded before a reasonably complete set of these relevant results could be identified by the NIST judges. This large number of relevant raised serious concerns about the test collection generated by last year's track, relating to reusability and also score saturation [V oorhees et al., 2022, Craswell et al., 2022]. To address these concerns, we made three changes this year with the goal of reducing the number of relevant results per query and in general the judgment costs so that they may be reused to obtain more complete set of judgments and consequently a more reusable test collection: [1] We used test queries that did not contribute to the MS MARCO corpus.
- Europe > Austria > Vienna (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
Overview of the TREC 2021 deep learning track
Craswell, Nick, Mitra, Bhaskar, Yilmaz, Emine, Campos, Daniel, Lin, Jimmy
At TREC 2021, we hosted the third TREC Deep Learning Track continuing our focus on benchmarking ad hoc retrieval methods in the large-data regime. As in previous years [Craswell et al., 2020, 2021a], we leverage the MS MARCO datasets [Bajaj et al., 2016] that made hundreds of thousands of human annotated training labels available for both passage and document ranking tasks. In addition, this year we refreshed both the document and the passage collections which also led to a nearly four times increase in the document collection size and nearly 16 times increase in the size of the passage collection. In addition to evaluating retrieval methods on the larger collections, the data refresh also aimed to provide additional metadata-- e.g., passage-to-document mappings--that may be useful for ranking as well as incorporate some fixes for known text encoding issues in previous versions of the datasets. This year, in addition to focusing on TREC-style blind evaluation of neural methods against strong traditional baselines, the track also encouraged participating groups to annotate their runs based on whether they employ dense retrieval methods and whether their ranking pipeline is a single stage retrieval process. The goal was to both encourage more explorations of neural methods in first stage retrieval as well as to allow analysis of how these emerging methods compare to previous state-of-the-art. Deep neural ranking models that employ large scale pretraininig continued to outperform traditional retrieval methods this year. We also found that single stage retrieval can achieve good performance on both tasks although they still do not perform at par with multistage retrieval pipelines. Finally, the increase in the collection size and the general data refresh raised some questions about completeness of NIST judgments and the quality of the training labels that were mapped to the new collections from the old ones which we discuss later in this report.
- Europe > Austria > Vienna (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)